#Install and load required packages
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggcorrplot)
library(leaflet)

Introduction

Colchester, a vibrant town with a rich historical history, also faces crime-related issues. To establish effective crime prevention and intervention measures, policymakers, law enforcement agencies, and the community must first understand the patterns and trends in crime data. The dataset being examined contains a wide range of information, from crime incidence details to meteorological conditions collected throughout the year. Through meticulous analysis and visualization, we seek to uncover patterns, trends, and relationships within the data that can offer valuable perspectives on crime dynamics and potential factors influencing criminal activities.

Overview of Crime and temperature data

The dataset on crime in Colchester offers a comprehensive overview of criminal activities reported throughout the year. It covers a wide range of categories, including anti-social behavior, burglary, vehicle crime, shoplifting, and drug-related offenses, among others. Each entry in the dataset provides detailed information such as the type of crime, its location, date of occurrence, and outcome status. On the other hand, the temperature dataset complements the crime data by providing meteorological information recorded in Colchester during the same time frame. This data set includes various parameters like temperature, precipitation, wind speed, visibility, and atmospheric pressure. Weather data is typically collected at regular intervals, such as hourly or daily, enabling a thorough examination of seasonal variations and weather patterns.

Data Collection and Preparation

We’re getting ready to study the crime and temperature data by combining them and removing unnecessary column for analyses. This helps us focus on the important information that will help us understand how crime and weather relate in Colchester.

# Set the working directory to the location of the data files

setwd("C:/Users/admin/OneDrive/Desktop/data visulisation")
# Read the crime data from the CSV file into the 'crimedata' data frame
crimedata<-read.csv('crime23.csv')
# Read the temperature data from the CSV file into the 'tempdata' data frame
tempdata<-read.csv("temp2023.csv")


# Extract the date information in YYYY-MM format from the 'Date' column in 'tempdata'
tempdata$Date<-substr(tempdata$Date, start=1,stop=7)

# Rename the 'Date' column to 'date' in 'tempdata' for consistency
colnames(tempdata)[which(names(tempdata) == "Date")] <- "date"

# Remove unnecessary columns from 'crimedata'
crimedata <- crimedata[, -which(names(crimedata) == "context")]

# Remove unnecessary columns from 'tempdata'
tempdata <- tempdata[, -which(names(tempdata) == "PreselevHp")]
tempdata <- tempdata[, -which(names(tempdata) == "SnowDepcm")]
tempdata <- tempdata[, -which(names(tempdata) == "WindkmhDir")]
tempdata<- tempdata[,-which(names(tempdata) == "SunD1h")]
# Summarize numeric columns in 'tempdata' 
tempdata_new <- tempdata %>% 
   group_by(date) %>% 
   summarise(across(where(is.numeric),~mean(.x, na.rm=TRUE)))

# Merge the crime data with the summarized temperature data
 combined_df <- merge(x = crimedata, y = tempdata_new, by = "date", all.x = TRUE)

visualisation of the data:

# Create a two-way frequency table to analyze the distribution of categories over months
twowaytable<-table(combined_df$category,combined_df$date)
print(twowaytable)
##                        
##                         2023-01 2023-02 2023-03 2023-04 2023-05 2023-06 2023-07
##   anti-social-behaviour      46      49      21      53      67      52      76
##   bicycle-theft              20      14      19      16      16      14      15
##   burglary                   17      22      14      22      15      26      14
##   criminal-damage-arson      59      37      52      63      64      42      42
##   drugs                      14      17      21      21      22      15      17
##   other-crime                 7       5       6      15       3      11      12
##   other-theft                48      37      35      38      42      41      51
##   possession-of-weapons       3       3      11       5       7       3       8
##   public-order               45      42      58      51      37      36      40
##   robbery                     8       7       8       7       7      17       6
##   shoplifting                76      31      51      40      51      59      33
##   theft-from-the-person       6       7      12       7       5       6       9
##   vehicle-crime              65      15      21      29      24      45      25
##   violent-crime             237     181     226     207     226     196     236
##                        
##                         2023-08 2023-09 2023-10 2023-11 2023-12
##   anti-social-behaviour      71      90      68      39      45
##   bicycle-theft              21      37      26      27      10
##   burglary                   20      18      31      11      15
##   criminal-damage-arson      33      47      45      53      44
##   drugs                       7      25      19      13      17
##   other-crime                 9       7       6       5       6
##   other-theft                41      34      49      37      38
##   possession-of-weapons       5       8       6       8       7
##   public-order               41      45      52      45      40
##   robbery                     5       8       9       5       7
##   shoplifting                57      33      43      39      41
##   theft-from-the-person       5       7       3       4       5
##   vehicle-crime              16      20      26      56      64
##   violent-crime             219     263     209     221     212

From the above table the Violent crime happens a lot all year round, peaking in September at 263 incidents and dropping in February to 181. Property crimes like burglary, theft from people, and vehicle crime also happen often. Shoplifting has a big spike in Jan with 76 incidents. Some crimes change with the seasons. For example, bicycle theft goes up in September with 37 incidents, while anti-social behavior is more common in November with 90 incidents. This helps us know when and where to work on preventing crime.

Bar plot

# Create a bar plot to visualize the total frequency of crime categories

ggplot(combined_df, aes(x = category)) +
  geom_bar(fill = "blue") + # Adding bars with blue color
  # Add total frequency exactly above each bar  
  geom_text(stat = 'count', aes(label = after_stat(count)), vjust = -0.2, position = position_dodge(width = 1)) + 
  labs(title = "Analysis of Crime Category Frequencies", # Adding title, x-axis label, and y-axis label
       x = "Crime Category",
       y = "Frequency") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability

The Bar graph displays how many crimes were reported in different types. Violent crime is the most common, with 2633 incidents. Anti-social behavior comes next with 677 incidents, then criminal damage/arson with 581, followed by other theft with 491, and burglary with 406. Possession of weapons is reported the least, only 74 times. Other crime is reported 92 times, and theft from the person is reported 76 times, making them relatively rare.

Density plot

##density plot##

# Create a density plot of average temperature
density_plot <- ggplot(combined_df, aes(x = TemperatureCAvg)) +
  geom_density(alpha = 0.6) + 
  labs(title = "Density Plot of Average Temperature",
       x = "Average Temperature (°C)",
       y = "Density") +
  theme_minimal()

# Convert ggplot object to plotly object
plotly_density <- ggplotly(density_plot)

# Display the interactive plot
plotly_density

The graph shows the average temperature ranging from 5 to 17.5 degrees Celsius on the x-axis. At approximately 6.8 degrees Celsius, the density is 0.15. The peak density is at 17 degrees Celsius, with a value of 0.17. The lowest density, 0.005, is at 9.8 degrees Celsius. Another notable peak is at 12 degrees Celsius, where the density is 0.06.

Violin plot:

##violin plot##
# Creating the violin 
v_plot <- ggplot(combined_df, aes(x = category, y = TemperatureCAvg)) +
  geom_violin(trim = FALSE) +
  labs(title = "Violin Plot of average temperature by Crime Category",
       x = "Crime Category",  
       y = "Average Temperature") +  # Set titles for plot and axes
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels

# Make the plot interactive using plotly
v_plot <- ggplotly(v_plot)

# Display the interactive plot
v_plot

The violin plot illustrates a strong correlation between the average temperature and the frequency of crimes across different categories. In this visualization, each “violin” represents a distinct crime category, with its width reflecting the frequency of incidents at different temperature levels. Wider sections indicate a higher incidence rate at that particular temperature, whereas narrower sections suggest fewer incidents. By comparing the temperature distributions across various crime categories, this visualization enables us to discern any discernible patterns or irregularities. For instance, it helps identify whether certain types of crimes tend to occur more frequently at either higher or lower temperatures, offering valuable insights into potential correlations between temperature and criminal activity.

Scatter plot

# Creating an interactive scatter plot with points colored by average temperature
temp_humidity_plot <- ggplot(combined_df, aes(x = TemperatureCAvg, y = Precmm,color = TemperatureCAvg)) +
  geom_point() +
  labs(title = "Scatter Plot of Temperature vs. Precipitation",
       x = "Temperature (C)",
       y = "Precipitation (mm)",
       color = "avg temp") +
  theme_minimal()

# Convert ggplot object to Plotly object
temp_humidity_plotly <- ggplotly(temp_humidity_plot)

# Display the interactive scatter plot
temp_humidity_plotly

The scatter plot displays precipitation (mm) on the y-axis and temperature (Celsius) on the x-axis. It reveals that as temperature increases, so does precipitation. For instance, when the temperature is low, the precipitation is also low, whereas higher temperatures correlate with higher precipitation levels. At a temperature of 6°C, the precipitation is at its lowest, around 0.07 mm, while at 10.55°C, the precipitation increases to 3.19 mm.

# Createing the scatter plot with a smoothing line
scatter_plot <- ggplot(combined_df, aes(x = VisKm, y = Precmm)) +
  geom_point(aes(color = Precmm), alpha = 0.6) +  # Points colored by precipitation amount
  geom_smooth(method = "loess", se = FALSE, color = "blue") +  # LOESS smoothing line without confidence interval
  labs(title = "Scatter Plot of Visibility vs. Precipitation with Trend Line",
       x = "Visibility (Km)", y = "Precipitation (mm)") +
  scale_color_gradient(low = "skyblue", high = "darkblue") +  # Color gradient from light to dark blue
  theme_minimal() +
  theme(legend.position = "none")  # Hide legend for clarity

# Display the scatter plot
scatter_plot
## `geom_smooth()` using formula = 'y ~ x'

The scatter plot with visibility (km) on the x-axis and precipitation (mm) on the y-axis, complemented by a trend line. It suggests a negative correlation between visibility and precipitation, indicating that as precipitation intensifies, visibility tends to diminish. This relationship is plausible as precipitation like rain or snow can obscure the air, reducing long-distance visibility. Additionally, some outliers are noticeable in the plot, suggesting instances where the relationship between visibility and precipitation may deviate from the overall trend.

Correlation analysis

## Correlation analysis##

# Select the numeric variables you want to include in the correlation analysis
cor_variables <- c("TemperatureCAvg", "Precmm", "WindkmhInt", "PresslevHp", 
                       "TdAvgC", "HrAvg", "WindkmhGust", "TotClOct")

# Compute the correlation matrix
corr_matrix <- cor(combined_df[, cor_variables])

# Create a correlation plot to visualize the relationship between variables
corr_plot <- ggcorrplot(corr_matrix, hc.order = TRUE, lab = TRUE)

# Display the plot
corr_plot

A correlation plot visually represents the strength of connections between pairs of variables in a dataset. Each variable is positioned along both the x and y axes, with the correlation coefficient between them shown as either a color gradient or numeric values within cells. A positive correlation, indicated by values closer to +1, suggests that as one variable increases, the other tends to increase as well. Conversely, a negative correlation, depicted by values nearer to -1, implies that as one variable increases, the other typically decreases. A correlation close to zero indicates a weak linear relationship between the variables. It’s important to note that the correlation between identical variables is always high, at +1. For example, the correlation between TemperatureCAvg and PresslevHp is nearly insignificant, at approximately 0.03.

Time series

##time series##

# Prepare the date column for time series analysis
combined_df$newdate <- paste(combined_df$date, "-01", sep = "")

# Convert 'date' column to Date type
combined_df$newdate <- as.Date(combined_df$newdate, format = "%Y-%m-%d")

# Create a time series plot to visualize multiple weather variables over time
time_series <- ggplot(combined_df, aes(x = newdate)) +
  # Each line represents a different weather variable over time
  geom_line(aes(y = HrAvg, color = "Hourly Avg")) +
  geom_line(aes(y = Precmm, color = "Precipitation")) +
  geom_line(aes(y = VisKm, color = "Visibility")) +
  geom_line(aes(y = WindkmhGust, color = "Wind Gust")) +
  labs(title = "Time Series Plot of Weather Variables",
       x = "Date", y = "Value", color = "Variable") +
  scale_color_manual(values = c("Hourly Avg" = "blue", "Precipitation" = "red", "Visibility" = "green",
                                 "Wind Gust" = "orange")) +
  theme_minimal()

# Display the time series plot
print(time_series)

The time series plot showcases the diverse weather parameters throughout the duration of 2023. Each line within the plot denotes a distinct weather metric like Hourly Average, precipitation, Visibility, and Wind Gust, tracked across the months from January to December. Through this visualization, one can discern the seasonal fluctuations inherent in each variable, including temperature highs and lows, and observe their interrelation over the course of the year. By unveiling these seasonal patterns and exploring the correlations between different weather indicators, the plot provides valuable insights into the dynamic nature of weather phenomena throughout the year 2023.

Leaflet map

##leaflet map##

# Convert 'category' to factor
combined_df$category <- factor(combined_df$category)

# Define a vector of 14 distinct colors
colors <- c("darkblue", "lightblue", "darkgreen", "lightgreen", "violet", 
                   "darkorange", "pink", "red", "yellow", "purple",
                   "cyan", "brown", "gray", "orange")

# Create a leaflet map
crime_data_map <- leaflet(combined_df) %>%
  addTiles() %>%  # Add default OpenStreetMap tiles
  addCircleMarkers(
    lng = ~long,    # Longitude
    lat = ~lat,     # Latitude
    color = ~colors,  # Color by crime category
    popup = ~category,  # Popup text
    radius = 3,         # Marker radius
    fillOpacity = 0.7   # Marker fill opacity
  ) %>%
  addLegend(
    position = "bottomright",    # Position of the legend
    colors = colors,              # Assign 14 different colors
    labels = levels(combined_df$category)  # Labels for legend
  )

# Display the map
crime_data_map

The map showcases different crime categories in colchester such as anti-social behavior, bicycle theft, burglary, criminal damage/arson, drugs, other theft, possession of weapons, public order, robbery, shoplifting, theft from the person, vehicle crime, and violent crime. Each category is denoted by a distinct colored dot on the map.

Section 3:

Colchester, a lively town steeped in history, grapples with crime-related issues. To tackle these challenges effectively, it’s vital for policymakers, law enforcement, and the community to grasp crime data trends. Our analysis delves into Colchester’s 2023 crime data, focusing on monthly crime distributions.

The data shows that violent crime is a persistent issue, reaching its peak in September with 263 incidents. Property crimes like burglary and vehicle theft also occur frequently throughout the year. We notice seasonal variations, such as a rise in shoplifting in January and anti-social behavior peaking in November. These findings help us target resources strategically and focus on preventing crime. By concentrating on problem areas and taking proactive measures, we aim to reduce criminal activity and improve community safety. Collaboration and using data to make decisions are essential for effectively addressing these challenges. Our objective is to create a safer and stronger community for everyone in Colchester. Through innovation and involving the community, we work towards a better future and a thriving society.

The bar graph visually demonstrates crime prevalence, with violent crime topping the list at 2633 incidents. Anti-social behavior and criminal damage/arson follow closely, reflecting the multifaceted nature of local crime. Understanding these patterns helps policymakers prioritize resources effectively to combat crime and boost public safety.

The temperature density graph shows interesting patterns in how temperatures are spread out, which can help policymakers understand weather changes. The graph covers temperatures from 5 to 17.5 degrees Celsius and shows peaks and valleys that need a closer look. For example, the highest concentration of temperatures is around 17 degrees Celsius, suggesting this is a common weather range. On the other hand, there are fewer temperature readings around 9.8 degrees Celsius, indicating less typical weather patterns. By grasping these temperature fluctuations, policymakers can predict weather challenges more accurately and create specific plans to deal with them in communities.

The correlation plot gives policymakers a clear picture of how different factors in the dataset relate to each other. By using color gradients or numbers, policymakers can see how strongly variables are connected and in what direction. A positive correlation close to +1 means that when one variable goes up, the other tends to go up too, while a negative correlation near -1 means they move in opposite directions. When correlations are close to zero, it means there’s not much of a relationship between the variables. Policymakers need to know that identical variables always have a perfect positive correlation, while correlations close to zero, like the one between TemperatureCAvg and PresslevHp at around 0.03, show very little connection between them. Understanding these correlations helps policymakers pinpoint what might be causing certain issues and make smart decisions to address them effectively.

The map is a helpful tool for policymakers, showing where different types of crimes happen in Colchester. Each type of crime, like anti-social behavior or violent crime, is shown with a different colored dot on the map. This helps policymakers see where crimes are most common and where they need to focus their efforts. By studying this spatial data, policymakers can spot hotspots and trends in criminal activity, allowing them to target interventions and allocate resources where they’re needed most. Understanding the geography of crime helps policymakers create specific strategies to make Colchester safer and reduce crime in the area.

Conclustion:

In conclusion, our detailed look at crime and weather data in Colchester offers valuable insights for policymakers, law enforcement, and the community. We used visual tools like bar graphs, density plots, scatter plots, and correlation plots to uncover important patterns and trends. The high occurrence of violent crime throughout the year shows the need for specific actions to tackle this ongoing problem. Also, the seasonal ups and downs in certain crimes like bicycle theft and anti-social behavior suggest the importance of adjusting strategies according to the time of year. Our study of weather data also found connections between weather conditions and crime, highlighting the need for a combined approach to crime prevention. By using insights from both crime and weather data, policymakers can create effective plans to keep communities safe and resilient. Overall, our research emphasizes the value of using data to make smart decisions in dealing with complex issues like crime, leading to safer neighborhoods in Colchester and beyond.

Reference

1.Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.

2.R Core Team. (2023). R: A Language and Environment for Statistical Computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/.